Search CORE

56 research outputs found

CoreTSAR: Task Scheduling for Accelerator-aware Runtimes

Author: de Supinski Bronis R.
Feng Wu-chun
Rountree Barry
Scogland Thomas R. W.
Publication venue
Publication date: 01/01/2012
Field of study

Heterogeneous supercomputers that incorporate computational accelerators such as GPUs are increasingly popular due to their high peak performance, energy efficiency and comparatively low cost. Unfortunately, the programming models and frameworks designed to extract performance from all computational units still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP improves this situation by supporting natural migration of OpenMP code from CPUs to a GPU. However, these implementations currently lose one of OpenMP’s best features, its flexibility: typical OpenMP applications can run on any number of CPUs. GPU implementations do not transparently employ multiple GPUs on a node or a mix of GPUs and CPUs. To address these shortcomings, we present CoreTSAR, our runtime library for dynamically scheduling tasks across heterogeneous resources, and propose straightforward extensions that incorporate this functionality into Accelerated OpenMP. We show that our approach can provide nearly linear speedup to four GPUs over only using CPUs or one GPU while increasing the overall flexibility of Accelerated OpenMP

Computer Science Technical Reports @Virginia Tech

Power efficient job scheduling by predicting the impact of processor manufacturing variability

Author: Casas Marc
Chasapis Dimitrios
Moreto Planas Miquel
Rountree Barry
Schulz Martin
Valero Cortés Mateo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Modern CPUs suffer from performance and power consumption variability due to the manufacturing process. As a result, systems that do not consider such variability caused by manufacturing issues lead to performance degradations and wasted power. In order to avoid such negative impact, users and system administrators must actively counteract any manufacturing variability. In this work we show that parallel systems benefit from taking into account the consequences of manufacturing variability when making scheduling decisions at the job scheduler level. We also show that it is possible to predict the impact of this variability on specific applications by using variability-aware power prediction models. Based on these power models, we propose two job scheduling policies that consider the effects of manufacturing variability for each application and that ensure that power consumption stays under a system-wide power budget. We evaluate our policies under different power budgets and traffic scenarios, consisting of both single- and multi-node parallel applications, utilizing up to 4096 cores in total. We demonstrate that they decrease job turnaround time, compared to contemporary scheduling policies used on production clusters, up to 31% while saving up to 5.5% energy.Postprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Parallelizing Heavyweight Debugging Tools with MPIecho *

Author: Barry Rountree
Bronis R De Supinski
Guy Cobb
Henry Tufo
Martin Schulz
Todd Gamblin
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT Idioms created for debugging execution on single processors and multicore systems have been successfully scaled to thousands of processors, but there is little hope that this class of techniques can continue to be scaled out to tens of millions of cores. In order to allow development of more scalable debugging idioms we introduce MPIecho, a novel runtime platform that enables cloning of MPI ranks. Given identical execution on each clone, we then show how heavyweight debugging approaches can be parallelized, reducing their overhead to a fraction of the serialized case. We also show how this platform can be useful in isolating the source of hardwarebased nondeterministic behavior and provide a case study based on a recent processor bug at LLNL. While total overhead will depend on the individual tool, we show that the platform itself contributes little: 512x tool parallelization incurs at worst 2x overhead across the NAS Parallel benchmarks, hardware fault isolation contributes at worst an additional 44% overhead. Finally, we show how MPIecho can lead to near-linear reduction in overhead when combined with Maid, a heavyweight memory tracking tool provided with Intel's Pin platform. We demonstrate overhead reduction from 1, 466% to 53% and from 740% to 14% for cg.D.64 and lu.D.64, respectively, using only an additional 64 cores

CiteSeerX

Movements of marine fish and decapod crustaceans: Process, theory and application

Author: Acosta
Acosta
Adams
Addicott
Aebischer
Aguilar-Perera
Allen
Allen
Allison
Anderson
Anger
Antonovics
Arendt
Armstrong
Armstrong
Arnold
Arnold
Attwood
Baker
Balon
Barry
Beck
Beentjes
Bell
Bell
Bell
Bell
Bergman
Bergstedt
Bernhardt
Bishop
Block
Block
Block
Boehlert
Boesch
Bolden
Borg
Botsford
Boulanger
Boyd
Bradbury
Breitburg
Breitburg
Burns
Burrough
Burrough
Burrows
Burt
Butman
Cain
Calder
Cale
Campana
Campbell
Campbell
Campbell
Carleton
Chabanet
Chapman
Clark
Coggan
Cohen
Cole
Conover
Conover
Corsi
Cote
Crist
Cushing
Dahlgren
Dall
Davis
Davis
Day
De Robertis
Deegan
DiBacco
Dicke
Dickey
Dingle
Dugan
Ebenman
Eckman
Eggleston
Eggleston
Eggleston
Eggleston
Elton
Enright
Epifanio
Epifanio
Eristhee
Fairweather
Farina
Fitch
Ford
Forman
Forman
Fortin
Fortin
Forward
Forward
Fraser
Fraser
Frederick
Freire
Fry
Fry
Fuiman
Gardner
Garrabou
Gaylord
Gibson
Gibson
Gibson
Gibson
Gillanders
Gillanders
Gillespie
González-Gurriarán
Green
Green
Green
Greenberg
Gruber
Gunn
Gustafson
Gustafson
Haines-Young
Hansen
Hanski
Hanski
Harden-Jones
Hargis
Harmelin-Vivien
Harris
Harris
Haslett
Haslett
Hastein
Hatcher
Hatcher
Haury
Healey
Heath
Heath
Heath
Heath
Hedley
Heithaus
Helfman
Helfman
Herrnkind
Hilborn
Hill
Hinckley
Hinckley
Hines
Hines
Hixon
Hobbs
Hobson
Holland
Holland
Holling
Hooge
Hooge
Hutchinson
Incze
Irlandi
Jackson
Janssen
Jenkins
Jennings
Jepsen
Jernakoff
Johannes
Johnson
Jones
Judson
Kareiva
Kavanagh
Kerkhoff
King
Kingsford
Kingsford
Klima
Klimley
Klimley
Klimley
Klimley
Kneib
Kneib
Kneib
Kneib
Kneib
Kneib
Kolasa
Kolasa
Kolasa
Kostylev
Kotliar
Koutsikopoulos
Kozakiewicz
Kramer
Kuipers
Laegdsgaard
Laffaille
Larkin
Lazzari
Lee
Lee Long
Legendre
Leis
Leis
Lenihan
Levin
Levin
Levin
Levin
Levin
Levin
Lewis
Li
Lindeman
Lipcius
Lirman
Loehle
Lutcavage
MacMahon
Macpherson
Macpherson
Marguillier
Marquet
Marsh
Marshall
Mary
Mazeroll
McAlpine
McCauley
McClanahan
McCleave
McCormick
McCoy
McGarigal
McGarigal
McLoughlin
McNab
McRea
Meentemeyer
Metcalfe
Meyer
Middaugh
Mileikovsky
Millar
Milne
Minns
Moorcroft
Morgan
Morris
Morrison
Morrissey
Morton
Mouillot
Mumby
Mumby
Mumby
Mumby
Murray
Nagelkerken
Nagelkerken
Nagelkerken
Nagelkerken
Natunewicz
Natunewicz
Nemtov
Nichols
Norcross
Norris
Norse
Olmi
Overholtzer
O’Dor
O’Neill
O’Neill
Paine
Parrish
Pasqualini
Pasqualini
Pearson
Pearson
Perkins–Visser
Petersen
Peterson
Phillips
Phillips
Phillips
Pihl
Pittman
Plotkin
Pollock
Pollock
Polovina
Potter
Powell
Priede
Priede
Quinlan
Quinn
Rangeley
Ranwell
Reese
Reiss
Reiss
Rejmanek
Riegl
Righton
Riitters
Robbins
Robbins
Robblee
Roberts
Robertson
Robertson
Rochet
Roff
Rolstad
Rooker
Rooker
Roughgarden
Rountree
Rountree
Rountree
Rozas
Rozas
Ruiz
Russ
Russell
Ryer
Sale
Sale
Samoilys
Santos
Saunders
Saura
Schaefer
Schaffner
Scheltema
Schneider
Schneider
Secor
Sedberry
Senft
Shanks
Shapiro
Shepherd
Sheppard
Shpigel
Sims
Sims
Sladek Nowlis
Smith
Smith
Sogard
Sogard
Sogard
Sogard
Solomon
Sousa
Sousa
Southwood
Southwood
Spencer
Spomer
Sponaugle
St.Mary
Stamps
Steele
Steele
Steele
Steele
Stephenson
Stone
Stoner
Stouder
Sulkin
Swearer
Swihart
Symonds
Szedlmayer
Taillade
Teixidó
Thorrold
Thorrold
Thorson
Thresher
Tischendorf
Tuck
Turchin
Turchin
Turner
Turner
Urban
van Dam
van der Veer
Vance
Vance
Victor
Victor
Vivien
Walsh
Warner
Warner
Watson
Weisberg
Welch
Werner
Werner
Werner
Werner
Whitehead
Whitfield
Whittaker
Wicker
Wiens
Wiens
Wiens
Wiens
Wiens
Wiens
Wiens
Williams
Williams
Willis
Wirjoatmodjo
Wolanksi
Wolanski
Wolcott
Wolcott
Wootton
Wu
Yabuta
Yapp
Young
Zeller
Zeller
Zeller
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

Many marine species have a multi-phase ontogeny, with each phase usually associated with a spatially and temporally discrete set of movements. For many fish and decapod crustaceans that live inshore, a tri-phasic life cycle is widespread, involving: (1) the movement of planktonic eggs and larvae to nursery areas; (2) a range of routine shelter and foraging movements that maintain a home range; and (3) spawning migrations away from the home range to close the life cycle. Additional complexity is found in migrations that are not for the purpose of spawning and movements that result in a relocation of the home range of an individual that cannot be defined as an ontogenetic shift. Tracking and tagging studies confirm that life cycle movements occur across a wide range of spatial and temporal scales. This dynamic multi-scale complexity presents a significant problem in selecting appropriate scales for studying highly mobile marine animals. We address this problem by first comprehensively reviewing the movement patterns of fish and decapod crustaceans that use inshore areas and present a synthesis of life cycle strategies, together with five categories of movement. We then examine the scale-related limitations of traditional approaches to studies of animal-environment relationships. We demonstrate that studies of marine animals have rarely been undertaken at scales appropriate to the way animals use their environment and argue that future studies must incorporate animal movement into the design of sampling strategies. A major limitation of many studies is that they have focused on: (1) a single scale for animals that respond to their environment at multiple scales or (2) a single habitat type for animals that use multiple habitat types. We develop a hierarchical conceptual framework that deals with the problem of scale and environmental heterogeneity and we offer a new definition of 'habitat' from an organism-based perspective. To demonstrate that the conceptual framework can be applied, we explore the range of tools that are currently available for both measuring animal movement patterns and for mapping and quantifying marine environments at multiple scales. The application of a hierarchical approach, together with the coordinated integration of spatial technologies offers an unprecedented opportunity for researchers to tackle a range of animal-environment questions for highly mobile marine animals. Without scale-explicit information on animal movements many marine conservation and resource management strategies are less likely to achieve their primary objectives

CiteSeerX

Crossref

University of Queensland eSpace

Recommended from our members

Theory and Practice of Dynamic Voltage/Frequency Scaling in the High Performance Computing Environment

Author: Rountree Barry
Rountree Barry
Publication venue: The University of Arizona.
Publication date: 01/01/2009
Field of study

This dissertation provides a comprehensive overview of the theory and practice of Dynamic Voltage/Frequency Scaling (DVFS) in the High Performance Computing (HPC) environment. We summarize the overall problem as follows: how can the same level of computational performance be achieved using less electrical power? Equivalently, how can computational performance be increased using the same amount of electrical power? In this dissertation we present performance and architecture models of DVFS as well as the Adagio runtime system. The performance model recasts the question as an optimization problem that we solve using linear programming, thus establishing a bound on potential energy savings. The architectural model provides a low-level explanation of how memory bus and CPU clock frequencies interact to determine execution time. Using insights provided from these models, we have designed and implemented the Adagio runtime system. This system realizes near-optimal energy savings on real-world scientific applications without the use of training runs or source code modification, and under the constraint that only negligible delay will be tolerated by the user. This work has opened up several new avenues of research, and we conclude by enumerating these

The University of Arizona

Power-Bounded HPC Performance Optimization (Dagstuhl Perspectives Workshop 15342)

Author: Rountree Barry L.
Publication venue: Dagstuhl Reports. Dagstuhl Reports, Volume 5, Issue 8
Publication date: 01/01/2016
Field of study

This report documents the program and the outcomes of Dagstuhl Perspectives Workshop 15342 "Power-Bounded HPC Performance Optimization". The workshop consists of two parts. In part one, our international panel of experts in facilities, schedulers, runtime systems, operating systems, processor architectures and applications provided thought-provoking and details insights into open problems in each of their fields with respect to the workshop topic. These problems must be resolved in order to achieve a useful power-constrainted exascale system, which operates at the highest performance within a given power bound. In part two, the participants split up in three groups, trying to address certain specific subtopics as identified during the expert plenaries. These subtopics have been discussed in more detail, followed by plenary sessions to compare and synthesize the findings into an overall picture. As a result, the workshop identified three major problems, which need to be solved on the way to power-bounded HPC performance optimization

Dagstuhl Research Online Publication Server

Theory and practice of dynamic voltage /frequency scaling in the high performance computing environment

Author: Rountree Barry Louis
Publication venue: The University of Arizona
Publication date: 01/01/2010
Field of study

ProQuest OAI Repository

Applying High-Performance Computing to Multi-Area Stochastic Unit Commitment for Renewable Energy Integration

Author: Oren Shmuel S.
Papavasiliou Anthony
Rountree Barry
Publication venue
Publication date: 01/01/2015
Field of study

We present a parallel implementation of Lagrangian relaxation for solving stochastic unit commitment subject to uncertainty in renewable power supply and generator and trans- mission line failures. We describe a scenario selection algorithm inspired by importance sampling in order to formulate the stochastic unit commitment problem and validate its performance by comparing it to a stochastic formulation with a very large number of scenarios, that we are able to solve through parallelization. We examine the impact of narrowing the duality gap on the performance of stochastic unit commitment and compare it to the impact of increasing the number of scenarios in the model. We report results on the running time of the model and discuss the applicability of the method in an operational setting

DIAL UCLouvain